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ABSTRACT 

We describe different methods for estimating the bispectrum of Cosmic Microwave 
Background data. In particular we construct a minimum variance estimator for the 
flat-sky limit and compare results with previously-studied frequentist methods. Ap¬ 
plication to the MAXIMA dataset shows consistency with primordial Gaussianity. 
Weak quadratic non-Gaussianity is characterised by a tunable parameter f^L, corre¬ 
sponding to non-Gaussianity at a level ~ 1C )~ 5 /nl (ratio of non-Gaussian to Gaussian 
terms), and we find limits of \/nl\ < 950 for the minimum-variance estimator and 
\Jnl\ < 1650 for the usual frequentist estimator. These are the tightest limits on 
primordial non-Gaussianity which include the full effects of the radiation transfer 
function. 

Key words: cosmic microwave background- statistics 


1 INTRODUCTION 

A primary aim of contemporary cosmology is to explain the origin of structure in the universe. The current standard cos¬ 
mological model has been able to explain a number of critical observations such as the recession of galaxies, the abundance 
of light elements and the relic, black-body radiation. A key assumption is that the universe is uniform on large scales. We 
know however that the distribution of matter is not homogeneous and it is therefore necessary to construct extensions to the 
standard model which can explain these irregularities. 

All theories for the inhomogeneity of space and matter rely on mechanisms which generate perturbations of a stochastic 
nature. The theory of inflation relies on the amplification and conversion of small scale quantum fluctuations into classical 
fluctuations of space time on large scales. Alternatively, theories of topological defects lead to the generation of classical 
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inhomogeneities through thermal phase transitions in the early universe; in this case the stochasticity is due to the thermal 
fluctuations in the very early universe and the highly non-linear evolution of the topological defects. Other alternatives 
have been proposed in the recent years, based on the assumption that we live in a three-dimensional membrane in a higher 
dimensional space-time. The interactions and collisions between various “branes” in this cosmos will lead to fluctuations in 
our space time. It must therefore be clear that to identify the correct theory of structure formation one must attempt to 
characterize its stochastic properties. This must be done in terms of our observable universe. 


An excellent observable for undertaking such an analysis is the Cosmic Microwave Background (CMB). Photons from 
the time of recombination will have an imprint of fluctuations in the density of radiation and baryons at that time. These 
fluctuations are sufficiently small that one can consider them as linear perturbations on a homogeneous space-time. Hence 
one should be able to characterize the stochastic nature of inhomogeneities without it having been obscured by subsequent 
non-linear gravitational collapse. Measurements of the CMB have already led to a major breakthrough in tackling the problem 
of structure formation. The COBE satellite showed that the variance of fluctuations on large scales is consistent with the 
principle of cosmological scale-invariance (Bennett et al. 1996): the variance of fluctuations in the gravitational potential (a 
generalization of th e Newtonian potential) at early time is the same on all scales. Recent measure ments from the TOCO 
(Torbet et al. 1999), MAXIMA (Hanany et al. 2000), BOOMERanG ( Bernardis et al. 2000 ), DASI ( Halverson et al. 2002 ), 
VSA ( Scott et al. 2002 ) and CBI ( Pearson et al. 2002 ) experiments, of the variance of fluctuations on smaller scales have 
revealed a preferred scale of 1° consistent with adiabatic initial fluctuations in a Euclidean space. It is believed that in the 
next few years an accurate picture of the fluctuations in the CMB down to sub-arcminute scales will emerge. 


Given the quality of the data it is timely to undertake a more complete characterization of the statistical properties of 
inhomogeneities through an analysis of the CMB data. In addition, primordial non-G aussianity should be easier to detect in 
the CMB than in large-scale structure ( Verde, Wang, Heavens, fe Kamionkowsk: 200C). In all cases of interest, the Probability 
Distribution Function (PDF) can be completely described in terms of its spatial n-point correlation functions, which are the 
expectation values of all possible products of the random field with itself at different points in space. Under the assumption 
of statistical isotropy and homogeneity, it is normally more useful to characterize the PDF in terms of higher order moments 
of the Fourier transform of the field. Most readers are familiar with the 2-point moment, the power spectrum of fluctuations 
( Ci ). Indeed current efforts in the analysis of CMB data have focused mainly on increasingly precise estimates of the angular 
power spectrum. For perturbations induced by Inflation, which is the current favourite model of structure formation, the CMB 
power spectrum plays a central role, since the statistics are very close to Gaussian and all non-zero moments of order n > 2 
can be expressed in terms of the C(. 


In this paper we will present a detailed analysis of the bispectrum. The bispectrum is the cubic moment of the Fourier 
transform of the temperature field and it can be seen as a scale-dependent decomposition of the skewness of the fluctuations 
(in much the same way as the Ct is a scale dependent decomposition of the variance of fluctuations). The bispectrum can 
be used to look for the presence of a non-Gaussian signal in the CMB sky. We use the data collected with the MAXIMA-1 
experiment (Hanany et al. 2000) to quantify the bispectrum of the CMB. The Gaussianity of this data set has already been 
analysed using complementary methods in Wu et al, (2001), including the methods of moments, cumulants, the Kolmogorov 
test, the x 2 test, and Minkowski functionals in eigen, real, Wiener-filtered and signal-whitened spaces and with one particular 
estimator of the bispectrum (Santos et al. 2002). In the past few years, interest in the bispectrum has grown in the scientific 
community. Estimates of the bispectrum in the COBE data proved the statistics to be extremely sensitive to some non- 
Gaussian features in the data, be they cosmological or systematic (Heavens 1998); the quality of galaxy surveys has made 
it possible to test for the hypothesis that the matter overdensity is a result of non-linear gravitational collapse of Gaussian 
initial conditions (|5coccimarro et ah 


2001; Feldman et al. 


2001). On the other hand a serious effort has been undertaken to 


calculate the expected bispectrum from various cosmological effects; secondary anisotropies, such as the Ostriker-Vishniac 
effect, lensing, Sunyaev-Zel’dovich effect (Spergel & Goldberg 1999| ; |Goldberg fe Spergel 1999; Cooray & Hu |2000| ), as well 
as primordial sources, such as non-linear c orrections to inflationary perturbations o r cosmic seeds, may lead t o observable 
sign atures in the bispectrum of the CMB (Luc 19941 J; Verde et al. 200C ; laffc 1991 ; Sangui fe Martin 2000b|; Bean et al 


1991 ; Contaldi fe Magueijo [2001 ; |Gangui et al 
200 1 |). 


2001 


Wang fe Kamionkowski 


200C; 


Komatsu et al. 2002; Komatsu fe Spergel 


In this paper we study some possible estimators for the bispectrum and use them to check for Gaussianity of the MAXIMA 
CMB map and to constrain possible theoretical models of the bispectrum. After a few definitions, we discuss in section jij 
the minimum variance cubic estimator and develop its application in the the small sky approximation. We then use this 
estimator to analyse the MAXIMA data and test for the Gaussian hypothesis. In section [| we review the standard frequentist 
estimator and apply it to the same dataset so as to compare with the previous method. Finally in section ^ we discuss the 
weak non-linear coupling model as a specific source for a primordial bispectrum and use the previously obtained values to 
constrain the free parameter in this theory. 
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2 FORMALISM AND NOTATION 

The bispectrum is defined to be the harmonic transform of the 3-point correlation function: 

3 = { a l\™.i a i2m 3 a l 3 m 3 ), ( 1 ) 

where the angle brackets indicate ensemble averages and the CMB temperature fluctuation field, AT(fl) was expanded into 
spherical harmonics 


a lm = / dHAT(fI)Y/ m (n). 


( 2 ) 


£1 corresponds to the spherical polar coordinates ( 6 , (j >) and dfl represents an element of solid angle. Rotational invaria nce 
of th e Universe together with invariance under reflections means we can write the bispectrum in the following form (see Hu 
200 l]): 


B 


m l m 2 rn 3 s->m\rri2m3 7 

'^1^3 _ ^*1*2*3 C 


(3) 


where be 1 e 2 e 3 is a real symmetric function of and Qffifft 1 ™ 3 is the Gaunt integral defined by 

0W3 2m3 = / dO n imi (0)Ye 2 m 2 (n)Y e3m3 (fi) 

f2£ 1 + l)(2£ 2 + l)(2£ 3 + l) (t 1 £2 £3 


47T 


r£i £2 4\ / £1 £2 £3 \ 
V 0 0 0 ) Vm 1 m 2 m 3 ) 


(4) 


GTdif f i™ 3 is a pu rely geometrical term and will impose all the well known selection rules for the bispectrum (e.g. Edmond: 


1957, Luo 1994a) - it will be zero unless 


(i) £\ + £2 + £3 = even 

(ii) mi + m 2 + m 3 = 0 

(iii) \£i - £j\ < 4 ^ U + £j for i, j, k = 1, 2, 3. 


All t he relevant cosmological information will be contained in be 1 e 2 e 3 , called the reduced bispectrum ( Komatsu fe Sperge] 
2001 ), and this is the quantity we try to measure experimentally so as to compare with possible theoretical predictions. In 
the literature it is also common to use the angle-averaged bispectrum, related to the reduced bispectrum by 


B, 


m l m 2 m 3 


/ £1 £2 £3 \ 
Vmi m 2 m 3 ) 


pmim 2 m 3 

D <1<2<3 


(2£i + 1)(2£ 2 + 1)(24 + 1 ) (£1 £2 £3 


47T 


(h £2 £s \, 

l 0 0 0 J bllt 


(5) 


In the case of the MAXIMA-1 experiment (Hanany et al. 200C), the patch analysed is nearly flat, and thus it will be more 


natural to work in the small sky approximation where a map of the CMB can be considered approximately flat (White et al 
200C). We can then expand AT(x) in terms of the 2-dimensional Fourier modes (after the flat-sky conversion £1 — > x), 


A T(x) = 


d k , ik . x 

r a(k)e . 


, (Stt ) 2 

The Fourier transform of the 3-point correlation function will be given in this case, by 
(a(ki)a(k 2 )a(k 3 )) = (27r) 2 J5(fci, k 2 , fc 3 )<5 2 (ki + k 2 + k 3 ), 


( 6 ) 


(7) 


where the 2D delta function S 2 is a consequence of assuming translational invariance for the 2-dimensional surface. Again 
B(ki, k 2 , fc 3 ) is a real function, invariant under permutations of ki,k 2 ,k 3 . Invariance under rotations and reflections on 
the plane means B(ki, k 2 , k 3 ) will only depend on the moduli of the k vectors. On small angular scales there is a simple 
correspondence between the flat sky bispectrum and the full sky angular bispectrum, 


biii 2 i 3 ~ B(ki, k 2 , k 3 ), ( 8 ) 

with |ki| = £i (see Appendix^). Therefore, in the flat-sky approximation, there is no need to keep track of the Wigner 3-j 
symbols. Throughout the paper when we refer to the bispectrum in the flat-sky limit we will mean the above definition, that 
will correspond, to a good approximation, to the so-called reduced bispectrum. 
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3 AN APPROXIMATE MINIMUM VARIANCE ESTIMATOR 
3.1 General setup 


We seek an estimator of the reduced bispectrum, be 1 i 2 e 3 , for an arbitrary set of pixelized temperature measurements in a 
CMB map, with Gaussian noise with arbitrary correlations. Ideally, the estimator should be unbiased, and optimal in the 
sense that its error is as small as possible. If possible, it should also be lossless, so that it contains as much information as 
the original CMB map and have calculable statistical properties. We want to obtain an expression for the minimum variance 
estimator without assuming anything specific to the map being used. We start with a set of measurements {AT;}, making 
the pixelized map (AT, is the temperature fluctuation in a sky pixel labeled by i). The actual shape of the map, or whether 
we use a full sky or “flat-sky” analysis, will be irrelevant for this derivation. 

As in Heavens (199S), and in the spirit of the optimal quadratic estimator for the power spectrum in Tegmark (1997), 
we seek an estimator for the bispectrum which is cubic in the AT;. The actual derivation of this estimator is reviewed in 
Appendix |b| in the context of a full-sky analysis. Here we just show the main steps to recover the minimum variance cubic 
estimator. We start by considering quantities y a of the following form 


y a = 

pixels ijk 


ET jk AT) AT; AT) 




(9) 


where the Efj k are weights to be determined, and a is a shorthand for the triplet {t\. ii. £ 3 }. The y a are related to the 
bispectrum estimates, but will not be the bispectrum estimates themselves. 

The mean of y a involves the 3-point correlation function, which in turn, should be related to the reduced bispectrum, b a 
as follows: 


(ATiATjATk) = Y, b a Qijk 

at 


( 10 ) 


The expression for Q^jk depends on the actual basis being used to decompose the temperature measurements. It is defined in 
the appendix (eq. B3), for the full sky setup, but is not important here. The ensemble average of y is then 


{y a) = 'Y ba ' Fa 


( 11 ) 


where 

F aa ’=Y J QiikE?jk- ( 12 ) 

ijk 


F a a' is the Fisher and covariance matrix of the quantity y a . We obtain the weights for the optimal estimator of the reduced 
bispectrum, by minimizing the error on y a (e.g. ( y a y a '))• l n order to effect this, we assume that the field is close to Gaussian, 
and approximate the 6 -point correlation function by that for a Gaussian field (see Appendix B). The optimal weights are 


Fljk = 


rV- 1 - 

>7 1 S>kk' 


2 + 3 n ^ 


^jk £ 7 'k' 


Qi'i 


i'j'k' 


(13) 


where the summation convention is assumed and n is the number of pixels used. This is obtained by considering that the 
probability distribution function for the temperature field is close to a Gaussian. G .7 is then the 2-point correlation function 
of this temperature field, Gj = (AT;AT)). Finally, the estimator for the reduced bispectrum will be 


K = F-l,y a ,. (14) 

Hence to obtain the bispectrum, we just need to calculate whose expression is defined by equation (|Icj). We then obtain 

G,' by using the already-measured power spectrum and plug all this into equation (|l3|). Finally, we obtain y a using equation 
(jOi) which can, at least in principle, be inverted to get the bispectrum (eq. [l4| ). The derivation obtained here is symbolic and 
does not depend specifically on the characteristics of the map. In the flat-sky approximation it will be possible to obtain an 
expression equivalent to equation ([h^) and then all the above derivation will follow through in the same way. 


3.2 The flat case 

In the previous section we discussed what should be the minimum variance cubic estimator for the bispectrum in a general 
setup. This estimator could in principle be applied to the MAXIMA dataset by expanding the temperature map into spherical 
harmonics and calculating QTj, in equation (see appendix |c|). Since we will be looking at a small patch of the sky, it 
is possible to use the flat-sky approximation, and we will now derive the expression for the best cubic estimator using this 
approximation. 

The temperature measured by MAXIMA (AT s (x)) can be expressed in the following way: 
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A r s (x) = W(x) 


J d 2 i , AT(x , )B(|x — x'j) + ATjv(x) 


(15) 


where B(x) represents the effect of the beam smearing and pixelisation and W (x) is the window function defining the field of 
observation. AT(x) is the “true” temperature of the sky (the one we would like to measure) and ATat(x) is the temperature 
fluctuation due to the noise. 

Using the Fourier convention in equation ([]), the measured Fourier modes are related to the underlying, “true” temper¬ 
ature ones, by 

d 2 k' ~ 


a s (k) = 


(271 


■ W (k - k') [a(k') B(k') + ojv(k')] . 


(16) 


This expression shows clearly the problems associated with the extraction of information from the temperature map: there will 
be a contamination due to the noise and beam. Even if we ignore these contributions, the a(k) will necessarily be convolved 
by the window function and we can never hope to measure a single mode without full sky coverage. 


To obtain the estimator one first needs to calculate as it appears in equation Q1CJ). The precise steps for the derivation 
of the estimator in the flat-sky approximation can be found in appendix |c| Here we just present the main expressions. Assuming 
the noise bispectrum is zero, we can express the 3-point correlation function as 


(AT S (x;) ATs (xj) ATI, (x*,)) = 


ro o pli 

L de 'k 


<u 2 


d£ 3 B(£i,£ 2 ,£ 3 ) 


xB(£ i)B(£ 2 )B{£ 3 ) 


E 

perm. 

O 1 .e 2 .e 3 } 


dd 1 lZe{Rijk(£i, £2, £3, # 1 )} 


(17) 


Where the sum is over all the possible permutations of l\, £ 2 and £ 3 . The triangle relations are automatically imposed by 
the limits of integration (x; defines the position of pixel i in two dimensions). Rijk(£i,£ 2 ,£ 3 ,di) will basically correspond to 
a product of exponentials, for each pixel index, of the type e lk x , where |k| = £ (eq. C3 and eq. |Cl| ). This expression should 
be compared to equation The sum for each multipole was replaced by an integral as expected. We can now read 
from the above expression (approximating the integral by a sum) and following the previous discussion, the expression for y a 
becomes: 


y{£i.£ 2 .£z) = B{£i)B{£ 2 )B{£ 3 ) 


perm. 

{<1 .< 2 .< 3 } 


d0i1Ze{Rit jiy [£l,£ 2 ,£ 3 , 61) 




t-ic-l _ 

^>jj' 3 kk f 


2 + 3 n 


£jk £ j'k’ 


AT s (xi)AT’ s (xj)AT s (xfc)} 


(18) 


The fact that the matrix Rijk{£i,£2,£ 3 ,9i) can be decomposed into a product of objects each with just one pixel index, 
represents a major simplification in computational terms. Then, the sum over the pixels for R, : jk in equation ( |l8| ) can be 
decomposed into just a product of three operations of order n. What is this expression actually doing? To understand it, let 
us concentrate on the integrand of #1 and first assume the pixel values are uncorrelated (i.e. Then the first 

term, involving £.~f ^k 1 ’ J us ^ w h & t one would do naively, i.e. discrete Fourier transform (DFT) of the data for each of 

the three wave vectors (whose sum is zero) and multiply the coefficients together. The second term (with the 2 f 3n factor) 
would be a product of one DFT of the data with a DFT of the window function (basically a s (k) W( k)). This term should 
be small for ks larger than the fundamental mode (as defined by the size of the field of observation). As discussed later on, 
this argument has been backed up with some numerical results and for the modes we analyzed from the MAXIMA data we 
can safely ignore this second term. Looking closer at the first term, and now considering the full covariance matrix, we see 
that it will basically correspond to a product of three DFTs not of the map itself, but of Zi = £“ 1 AT(xj). This will then 
be integrated along 9\ so that the result is rotationally invariant. The advantages of analyzing the map z, instead of the 
original map, have already been discussed in Tegmark (1997). Weighting the pixels by the inverse of the covariance matrix 
will suppress the large scale power contribution and soften the edges of the map. This should help to reduce the “red leakage” 
from lower to higher multipoles and decrease the width of the window function. 

Estimating y is not however the end of the story: we need to relate it to the underlying “true” bispectrum. The connection 
is simply 


(y(£i,£ 2 ,£ 3 )) = J d£[ d£' 2 J d£' 3 F(li,l 2 ,£ 3 AAA) B(4,4,4), 

where, assuming the almost Gaussian approximation, 

F(£i,£ 2 ,£ 3, £[.£ 2 , £3) = {y{£i.£2.£3)y{£f£' 2 .£'z)). 


(19) 


( 20 ) 
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Having all this information, we can now try to estimate the bispectrum. The next section will describe several of the numerical 
details involved when applying these techniques to the MAXIMA dataset. 


3.2.1 Numerical Implementations 


We are interested in obtaining as many bispectrum values as computationally possible. The first task in estimating the 
bispectrum is to calculate y. Numerical implementation of equation (^) is straightforward. However, the second term in 
the integration of 9\, involving 2 _ t 3 3n GjkGj l k'£iP ' m ^ x pixels from Rijk(£i,£2,£3,0i). Then the sum over the pixels will 
effectively be of order n 2 , which can make the process quite slow. Fortunately, as previously discussed, for high enough 
multipoles this term should be negligible. The MAXIMA map used has a fundamental mode of A l « 40 and we are typically 
interested in modes for l > 110. Numerical tests also show that the contribution of the second term is always less than 0.1% 
and so we choose to neglect this term. 

To estimate the covariance matrix of y, we could use directly equation Jl2l ) (see also eq. C7). However, this proves to 
be somewhat slow, and we decided to use instead Monte-Carlo simulations to quantify the covariance. After obtaining y, 
we need to relate it to the bispectrum (eq. [ifi|). We see the relation requires in principle the knowledge of a 6-dimensional 
object, which is certainly not an easy task. Things become more involved here than when analysing the power spectrum. The 
straightforward way would be to discretize the integral and invert this relation as in equation (|i~4| ). Then the estimator would 
be unbiased: (B(£ 1 ,^ 2 ,^ 3 )) = B(£i,i 2 ,£ 3 ). However this would have problems, not only because it requires the calculation of 
the Fisher matrix for a large number of points, which is time consuming, but also because inverting F aa i is numerically quite 
difficult since the Fisher matrix is almost singular. Instead we used the following estimator for the bispectrum: 

y(£i,£ 2,£3) 


B(£ i,£2,£ 3 ) = 


where 


M(4,<? 2 ,£ 3 )’ 


M(£ i,£ 2 ,£ 3 )= / d£[ 


r t i 


d£'o 


d£' 3 F{£ i,€2,€ 3 ,€i,44)- 


( 21 ) 


( 22 ) 


This normalizes the Fisher matrix, so that 

r°° r e i r l 2 

(B(£ 1 ,£ 2 ,£ 3 )) = / d£[ / d£' 2 / d£' 3 w{£ 1 ,£2,£3,£ l i,£2,£3)B(£' 1 ,£' 2 ,£' 3 ), 

0 -ij- 

with 

...Ip p p p' p' plS- F (? 1 ,^ 2 , 4 , ^ 1 ,^ 2 , 4 ) 
w(£ 1 ,£2,£ 3 ,£ 1 ,£2,£ 3 ) = - A Hh ,e 2 j 3 ) -■ 

w can be considered as the measured bispectrum window function, and satisfies 


(23) 


(24) 


d£[ I, d£' 2 d£' 3 w(£ 1 ,t2,£3,£[,£'2J3) = l- 


(25) 


The larger the patch of the sky analyzed, the more narrow this window should be and eventually one approaches 

(B{£ 1 ,£2,£ 3 ))~B(£ 1 ,£2,£ 3 ). 


(26) 


Otherwise, the estimated bispectrum will just correspond to a weighted average of the target bispectrum. Calculating 
M(£i,£ 2 ,£ 3 ) would require in principle the sampling of the Fisher matrix into a fine enough grid so we can obtain a good 
approximation to the integral in equation (12(3). This is exactly what we would like to avoid. Instead, taking into account the 


expression for the Fisher matrix (eq. C7), it is possible to simplify the integral: 


M(£ 1 ,£ 2 ,£ 3 ) = g / d xB(\xi> - x|)B(|x 3 '/ - x|)S(|x fc / - x|) 


xB(£i)B(t2)B(t 3 ) 


E 

perm. 


ddxlle {Ri'yy (£i,£ 2 ,£ 3 , #i)} ZuHjfGk' 


(27) 


The integration over the beam functions can be eas ily d one if we approximate them by a Gaussian. We then obtain an 
expression that is easy to deal with numerically (eq. |Cll| ). The fit of the MAXIMA beam and pixel window function to a 
Gaussian might have some problems on the tails of the function. However we have checked that this difference only introduces 
small corrections to the calculation of M{£\,£ 2 ,£ 3 ) and can be ignored. Note that M(£\,£ 2 ,£ 3 ) does not depend directly on 
the temperature map {ATi} and therefore we only need to calculate it once. 
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Figure 1. Normalized window function for B(l 1 ,^ 2 ,^ 3 ) (f = 320,330, ...,560). 


3.2.2 The data 


Using the above simplifications, we can now try to estimat e the bispectrum from the MAX IMA data. The M AXIMA-1 
experiment and dataset is described in detail in Hanany et al. (200C). As in the previous paper ( Santos et al. 2002 ) we used a 
map with square pixels of 8’ each but considered a slightly larger area. We used the central region of the map which is 60 deg 2 
and contains « 3300 pixels. This patch has the most uniform sampling and has the highest signal-to-noise ratio. We estimated 
the bispectrum from 1 = 110 up to £ = 740 which corresponds approximately to the same multipole range as in the previous 
analysis. For the size of the steps along this multipole range, in principle one could choose Al = 1. However, not only this 
is very demanding in computational terms, but also we do not gain any extra information in making the bins so fine, since 
there will be strong correlations between the different measurements of order 2 tt/9, where 6 is the linear size of the smallest 
dimension of the sky patch (Bond et al. 1998, Tegmark 1997). Although the region analyzed is not exactly rectangular, it is 
possible to get an estimate of the fundamental frequency mode using the above expression, which gives A l ~ 40. This can 
be confirmed by looking at Fig. [I] were we show the window function (eq. H) for a few multipole values. As expected, the 
width of the window is approximately twice this fundamental mode. Figure 0 gives an indication of the level of correlations 
between different measurements of the bispectrum. We have chosen a step of A l = 30 which will make it easier later on when 
comparing to theory. We can always combine the values into larger bins to decrease the correlations. 


3.2.3 Results 


Taking into account all the bispectrum symmetries, our choice of multipole range and step size, gives a total of 1409 bispectrum 
estimates. 

To measure the covariance matrix, < B(l\,li,l-s)B(l!\,l!^,l!-i) >, we use Monte-carlo (MC) simulations of the MAXIMA-1 
dataset. In the near-Gaussian approximation we are considering, the variance of the bispectrum is determined by the input 
power spectrum. To generate the MC maps we used a Gaussian signal with the power spectrum from the best fit model 


obtained in Balbi et al 


00C| ) : flcdm = 0.6,126 = 0 . 1 , 12 a = 0.3, n = 1.08 and h = 0.53 (normalized to the MAXIMA-1 
and COBE/DMR data). We also took into account the effects of pixelization and finite beam and included the full noise 
correlation matrix from the MAXIMA-1 experiment. We applied exactly the same estimator to 4000 of these simulated maps, 
obtaining 1409 modes for each realization. The values for the covariance matrix actually converge quite rapidly, so that 4000 
maps should be enough. 

The values obtained turn out to be too noisy and highly correlated, so that we decided to combine them into larger bins 
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Figure 2. Correlation between B( 440, 440, 440) and B( 440, 440, £). Multipole values are the same as in Fig. |l|. The rounded peak is due 
to the sharp increase of the variance for the diagonal term, B( 440, 440, 440). 


of size A£ = 60. We grouped the values in the following way: 
B Ll L 2 L 3 = Ar _ y: B{£ 1 ,^ 2 ,G), 


Nl\L 2 l 3 


(28) 


iSh 


where L = [Li — A£/ 2 , Li + A£/2] and L\ ^ L 2 ^ L 3 . Nl 1 l 2 l 3 is the number of estimated bispectra that fall in the bin and Li 
will cover all the multipole range with step A l. Note that values obtained through this binning do not have to necessarily obey 
the triangle relations for Li. We have a total of 216 binned values and applied the same process to the MC maps to calculate 
the covariance matrix. In Fig. we plot the corresponding diagonal values together with the 68% and 95% confidence limits 
obtained if the sky was indeed Gaussian. Note that this is only a small sample of all the bispectrum values. The variance is 
expected to be smaller for the non-diagonal terms. Although the variance for the estimator can only be obtained numerically, 
it is possible to have a rough idea of why the 68% contour follows the shape shown in Fig. If non-Gaussianity is small, the 
cosmic varian ce of the bispectrum, in the full sky setup, is given in terms of the two-point function of the ai m (Luc 1994 e; 
Heavem] 199S )• The variance of the angle-averaged bispectrum (eq. |s|) is then calculated as jGangui fc Martin |2000a )) 


(Bp ^2^3} ~ Ciii CV 2 Ci 3 A £ 3 £ 2 £ 3 


(29) 


where Ae 3 £ 2 £ 3 takes the values 1, 2, and 6 depending whether all £’s are different, two of them are same or all are the same, 
respectively. Ce is the total CMB angular power spectrum, including the noise. On the other hand, the estimator we are using 
in the flat-sky approximation should correspond approximately to an integration over rotations of products of three Fourier 
modes (see eq. fij). We therefore expect its variance to follow the same features and be approximately proportional to 

1 


dfiC tl )(eZCMCt 3 ) A ei £ 2 £ 3 . 


(30) 


(hGG ) 2 

The quantity £ 2 Ce is similar to the standard expression for the temperature fluctuation for large multipoles: l(t + 1)C , e/27t, 
giving the Sachs-Wolfe plateau. If we then plot l 2 £ 2 (-3 (B f, t t ), one should find a behavior close to the observed for the CMB 
power spectrum, multiplied by the relevant A£ x £ 2 £ 3 . Figure 0 shows exactly that for l\ = £2 = 440. One might think the peak 
at £3 = 440 is due to some sharp increase of the power spectrum, however an inspection of the current measured CMB power 
spectrum, shows this is not the case. Instead, the increase is simply due to the fact all £’s are the same for the bispectrum, so 
that Ae 1 £ 2 £ 3 = 6, contrary to Ai x £ 2 £ 3 = 2, making the variance for the diagonal bispectrum three times bigger than the other 
values. Also, if w e look at fig. [i|, the 6 8% contour should be proportional to ( t 2 Cf) i ^ 2 and follow the features of the MAXIMA 
power spectrum ( Hanany et al. [200(]| ): there is a peak at £ ~ 220 and a steady increase of the variance for large £ due to the 
increase of the noise power spectrum. 
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Figure 3. The minimum variance estimator - comparison with a Gaussian sky. The solid (dashed) line delimit the 68% (95%) confidence 
region determined from a Monte Carlo simulation of a Gaussian sky and the dots are the values obtained from the MAXIMA dataset. 


We would like to stress that there are two objectives in measuring the bispectra. First we want to compare the estimates 
against the Gaussian hypothesis and second we want to use this same measurements to constrain theories of structure formation 
that generate non-Gaussianity. We start by testing the Gaussian assumption. If the temperature anisotropies have a Gaussian 
distribution then the bispectrum estimates will obey 

(Bl 1 l 2 l 3 )=0 (31) 

The values we actually measure should fluctuate around this average in a way consistent with the probability distribution 
for the estimator in a Gaussian sky. One way to test if all the values are indeed compatible with the Gaussian hypothesis is 
through the use of the standard \ 2 - This goodness of fit will be appropriate as long as the distributions for the estimator are 
approximately Gaussian. In Fig. ^ we show some of the histogram distributions for Bl 1 l 2 l 3 , obtained from the Monte-Carlo 
realizations of the Gaussian sky. The first thing to note is that indeed we have ( Bl 1 l 2 l 3 ) = 0 as it should be for Gaussian 
temperature fluctuations, so that the estimator is unbiased as expected. Most of the histograms are consistent with a Gaussian 
distribution, although there are some deviations for low multipoles, due to the low number of independent modes contributing 
for these values. However these are only a small numbeij^ and it should not spoil the use of the \ 2 as a goodness of fit (an 
analysis without these values produced the same conclusions). Moreover, the measured yf can still be a robust measure of 
non-Gaussianity, as long as we calculate its full distribution from the MC realizations. We have thus considered the quantity: 

X 2 = 51 (B° bs - B tb ) C~l, ( B°J 3 - B*Jt ), (32) 

OCOt' 

where a = (Li, L 2 , L 3 ), Fff = 0 and C aa i is the covariance matrix of the estimators evaluated from the MC realizations. 
From a total of 216 values we found Xobs ~ 260. Using 4000 MC maps we constructed the corresponding \ 2 distribution (Fig. 
^j) and found the probability of the MC \ 2 being larger than Xobs to be Prob (xmc > Xobs ) ~ 15%. Thus, the MAXIMA map 
seems to be reasonably consistent with the Gaussian hypothesis. 

Although there was no detection of clear deviation from Gaussianity, one can still use the measured bispectrum to 
constrain theories of structure formation. This subject will be specifically addressed in a later section. 


1 4 out of 216 have 95%/68% > 2.2, where 95% and 68% correspond to the calculated confidence limits. 
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Figure 4. The variance of the bispectrum from the minimum variance estimator. Note the sharp increase at l 3 = 440 due to the diagonal 
component. 


4 THE STANDARD FREQUENTIST ESTIMATOR 


In the previous section we discussed an estimator for the bispectrum, cubic in the temperature pixels, obtained by minimizing 
the variance. This method deals automatically with partial sky coverage and should have the smallest error bars. However, 
its numerical implementation is somewhat complex and the computer calculations can be quite slow. We can consider instead 
a more direct approach to measure the bispectrum: do a harmonic transform or a Fourier transform of the map and consider 
cubic products of the ag, m or the a(k) as in equation (Q) or (:?]), combining them in an appropriate way. Although this estimator 
will not necessarily have the lowest variance possible, its numerical implementation is quite straightforward and extremely fast, 
making the method more feasible when dealing with very large datasets. We will therefore review the so-called frequentist 
estimator, as introduced by Ferreira, Magueijo, & Gorski (1998). Although its basic description was already addressed in 
(2002), it will be interesting to apply this method to exactly the same map used in the previous section and see 


Santos et al 


how well it will do compared to the previous estimator. 

In a full sky setup it is common to combine the measured ae„ 


into the angle-averaged bispectrum (see |Komatsu et al 


2005 for a description of the method). The relation to the reduced bispectrum was specified in equation ( 
already explained, it is more natural to use the flat-sky approximation for the MAXIMA dataset. 


However, as 


4.1 Flat case - FFT estimator 

The Fourier transform of the 3-point correlation function for the temperature anisotropies measured by MAXIMA will not 
be given directly by equation (Q), but instead (see eq. E> 

(o s (k 1 )« s (k 2 )a s (k 3 )) = J ^^W(k 1 -k' 1 )W(k 2 -k' 2 )W(k 3 + ki+k^ 

x B{k ' 2 ) Bflki - K\) B(ki, k 2 , |ki - k' a |)] . (33) 

where it was assumed that the noise bispectrum is zero. The quantity in parenthesis should not change much along the 
integration, given the shape of the window function, and one can then simplify the integral: 

(a s (ki)a s (k 2 )a s (k 3 )) « [B(ki) B(k 2 ) B(k 3 ) B(ki,k 2 , k 3 )] V , (34) 

with 


V = / d 2 ®IF 3 (x) 


(35) 
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Figure 5. Histograms of the minimum variance bispectrum estimator. Solid lines correspond to the best fit Gaussian distribution. 
Vertical lines represent the values obtained from the MAXIMA map. 


and ki + k 2 + k 3 = 0. So, to estimate the bispectrum, we should consider quantities like 

a c (ki)a c (k 2 )a c (k 3 )/V, (36) 


where a c (k) = a s (k )/B(k), so that the corrections due to the finite size of the map and the effects of the beam and pixelisation 
are already taken into account. Furthermore, the bispectrum is invariant under rotations and reflections, so that we need to 
combine values contributing to the same bispectrum. To measure B(£i, £ 2 , £ 3 ), we could then use 


-B(£l, £ 2 , £3) = 


d6»r [o c (k 1 (fl 1 ))a c (k 2 (fl 1 ,6>))a c (k 3 (6» 1 ,6>)) 


+a c (k 1 (6h))a c (k 2 (0i, —6>))a c (k 3 (0i, -0))], 


(37) 
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X 2 (0) 


Figure 6. x 2 distribution (as defined in eq. p2[ ) for the minimum variance estimator. The vertical line corresponds to the measured x 2 
( 260 ). 


where ki(0i) = £i(cos 9i, sin 9i), k2(0i,f?) = £ 2 (cos( 0 i + 0),sin(#i + 9)), ^. 3 ( 61 ,8) = — ki(#i) — k2(#i,0) and 9 is such that 
|k 3 1 = £ 3 . For each set of (£i,£ 2 ,£ 3 ) one would have in principle to calculate several o s (k) so as to perform the integral above. 
However, it is more efficient to first obtain all the Fourier modes by doing a Fast Fourier Transform (FFT) of the map. The 
modes a s (k) will then be discretised on a grid with step size A k determined by the width of the field of observation. This is 
fine, since Fourier modes separated by less than A k will be highly correlated, but also means the bispectrum values will be 
automatically discretised according to the grid. Moreover, we still need to combine the estimates (eq. related by rotations 
into the same measurement and to rearrange the values into bins, so as to lower the noise and reduce the correlations. We 
therefore use the following bispectrum estimator: 


Bl\L 2 l 3 


1 

NL1L2L3 V 


^ 7?.e{a c (ki)a c (k2)ac(k 3 )}, 
|ki|€b 


(38) 


where h = [Li — A£/ 2 , Li + A£/ 2 ], Nl 1 l 2 l 3 is the number of estimated bispectra that fall in the bin and ki + k 2 + k 3 = 0. 
A£ is the size of the bin and again, the Li do not have to obey the bispectrum selection rules. 


4-1.1 The data 

We applied the above estimator to exactly the same map used for the best cubic estimator. Note that this map is not 
exactly rectangular so that to use the Fast Fourier Transform we had to pad the map with extra zeros. We took the extra 
bias introduced by this into account when calculating the bispectrum. The FFT assumes the data has periodic boundary 
conditions. We therefore multiplied the map by a Welch window function to correct for the mismatch at the border of the 
region. This should reduce the leakage between neighboring Fourier modes and from small to large scales. The use of the 
Welch window function turns out to be essential to decrease the size of the error bars. To obtain the probability distribution 
and covariance matrix of the frequentist estimator in equation (^), we again used Monte-Carlo simulations of the MAXIMA-1 
dataset. 


4-1.2 Results 

We applied the estimator in equation ( ^j|) to the data using a bin size of A£ = 60 from L = 125 to L = 725, so as to match 
the same multipole values used for the binned best cubic estimator (eq. ^). This gives a total of 236 bispectrum values. We 
obtained the corresponding covariance matrix by applying the same process to 10000 MC realizations. In Fig. ^ we plot the 
corresponding diagonal values together with the 68 % and 95% confidence limits. Looking at the figure it seems that indeed 
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Figure 7. FFT estimator - diagonal bispectrum from the MAXIMA dataset compared to a Gaussian sky. 


the previous estimator manages to produce smaller error bars than the FFT estimator. This improvement on the signal to 
noise will also be confirmed when looking at the constraints imposed by both estimators to the weak non-linear coupling 
model. Note that th e variances of the di agonal components of the FFT estimator presented here are slightly larger than in the 
previous analysis of Santos et al. ( 2002 ): we have found that this is primarily due to the fact that we are considering smaller 
bins and different regions of the map. 

To test the Gaussian assumption we again considered the standard \ 2 as defined in equation (^). From the MC real¬ 
izations we checked that the estimator is indeed unbiased and has an histogram distribution approximately Gaussian. From 
a total of 236 values we found Xobs « 267. Using 10000 MC maps we constructed the corresponding \ 2 distribution (Fig. ^|) 
and found Prob ( Xmc > Xobs ) ~ 27%. The frequentist estimator also gives results consistent with the Gaussian assumption. 


5 COSMOLOGICAL IMPLICATIONS 

We have been discussing the use of the bispectrum estimates as a way to test for Gaussianity of the MAXIMA-1 dataset. 
However the bispectrum can be applied in more powerful ways: several theories of structure formation give specific, clean 
predictions for the bispectrum, which can then be compared to the data in much the same way as it is done for the power 
spectrum. Even if the temperature anisotropies turn out to be consistent with a Gaussian distribution, we can always use the 
measurements to put constraints on the parameters of these theories. 

Although we are interested in the bispectrum from a primordial origin, the signal can be contaminated with various 
foregrounds, such as extra-galactic radio and infrared sources, the Sunyaev-Zel’dovich (SZ) effect, the weak lensing effect and 
so on. For a discussion of these sources see Cooray & Hu (2000). The Ostriker-Vishniac effect and the coupling between the 


SZ and the weak lensing effects, typically give rise to non-zero bispectra on very small angular scales. Therefore they should 
not be detectable by MAXIMA, with the map we have been considering. The patch of the sky analyzed in the MAXIMA-1 


experiment (Hanany et al. 2000), shows no detectable radio or infrared sources. Moreover, these point sources would produce 
a reduced bispectrum approximately constant (Komatsu & Spergel 2001), which would show up clearly in Fig. (or Fig. |?]) as 
a sharp rise at high £. Finally, the contributions from interstellar dust and synchrotron radiation were shown to be negligible. 
From now on we will assume there are no measurable foreground contributions to the bispectrum and concentrate on the 
possible primordial sources , which can give rise to non-Gaussianity on degree and sub-degree scales. Even if there were such 
contributions, a cancellation of the several signals is unlikely to occur and the constraints obtained should then give upper 
limits. 

The main theory for structure formation in the universe is Inflation and, as already stated, it predicts Gaussian fluctuations 
to a very good approximation. This can be tracked down to the requirement that the inflaton potential must be flat enough 
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Figure 8 . y 2 distribution for the FFT estimator. Vertical line corresponds to x 2 ~ 267. 


so that Inflation can occur. The perturbations in the inflaton field, which are responsible for the generation of cosmological 
fluctuations, should then be linear. This basically means we are neglecting any interaction between the perturbed inflaton 
and other fields (or itself). However, if we took these interactions into account, the different Fourier modes of the above 
vacuum perturbation would couple, leading to non-Gaussian cosmological fluctuations. A few authors have considered this 
possibility and calculated the 3-point correlation function of the inflaton field (Gangui et al. 1994; Falk et al. 1993[ ), which in 
turn generates a non-zero bispectrum. Typically the bispectrum obtained in these calculations, c an be directly created just by 
considering a simple weak non-linear coupling model for the primordial curvature perturbation (Komatsu fe Spergel 2001): 


$(x) = <Fl(x) + f NL ($|(x) - ($|(x))) , 


(39) 


where <f>(x) is the curvature perturbation in real space and $l(x) the Gaussian part of the perturbation ((<f>z.(x)) = 0). The 
non-linear coupling constant, }nl, will be related to the slope and curvature of the inflaton potential (salopek & Bond 199C; 
Gangui et al. 1994). For example, /jvz, « 3e — 2tj, where e and r/ are the slow-roll parameters and typically /jvz, ~ 10 _ . The 
dominant source of non-Gaussianity is the weak non-Gaussian gravitational growth of perturbations at early times: general 
relativistic second-order perturbation theory gives /nl ~ 0(1) (Pyne & Carroll 1996). More drastic changes to the evolution 


of the inflaton field, as long as they are brief, could still be compatible with Inflation and generate strong non-Gaussianity. 


An example of a potentially strongly non-Gaussian model is the curvaton model of 

Lytli & Wandf 

( 2002|). Some recent 

proposals for using a primordial power spectrum with a break or a bump (Barriga et al. 

2001. Griffiths et al. 

2000 

) can 


give a good fit to the current CMB data and the non-Gaussian features of such models could be used as a way to break 
the degeneracy with standard cosmological models. Wang & Kamionkowski ( |200(][ ) analyzed a simple model for an inflaton 
potential with a feature and showed the level of non-Gaussianity is still negligible, but it would be interesting to further explore 
this possibility. Contrary to standard adiabatic perturbations, isocurvature ones can easily generate non-Gaussianity. They 
are normally created during Inflation by vacuum fluctuations of a non-inflaton field, so that the flatness requirements for the 


potential no longer apply. Pure isocurvature perturbations are however ruled out by experiment (Trotta et al. 2001). Bartolo 


et al.[(2002) considered instead correlated adiabatic and entropy perturbations, which should lead to sizeable non-Gaussian 


features in both modes. Topological defects are also well known to generate non-Gaussianity. Cosmi c strings, for instance, ca n 
produce non-Gaussian temperature fluctuations through weak lensing: the Kaiser-Stebbins effect (Kaiser fe Stebbins 1984). 


A proposal to promote the scale invariance of the power spectrum to full conformal invariance (Antoniadis et al. 1997) also 


accommodates interesting non-Gaussian possibilities. In the current analysis we shall concentrate on the model described by 
equation (139) and only consider terms linear in f nl■ 
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Figure 9. Diag onal theoretical bispectrum. The stars indicate the theoretical values used to compare to experiment, after the relevant 
binning (eq. 281). 


5.1 Weak non-linear coupling 

In the case of primordial adiabatic scalar perturbations, the coefficients ae n 
d 3 k 


aim = 47r(—i) 


(27 


r$(k)Ai(fc)>7m(k), 


can be written as ( Ma fe Bertschinger 1995 ): 

(40) 


where A i{k) is the radiation transfer function. One could also calculate the isocurvature perturbation by considering instead 
the primordial entropy perturbation, the corresponding transfer function and possibly, an expression for this primordial 
fluctuation similar to equation (].')9|) . From this, one can then calculate the reduced bispectrum (Komatsu & Spergel 2001) 


itheory 


where 


= 2 f NL / r 2 dr [bf 1 (r)6f 2 (r-)6^ I '(r) + b^(r)bi 2 L (r)be 3 {r) + b^ L {r)bf 2 (r)bf 3 (r)] , 


bt(r) = 


k 2 dkP<s> (fc) A e ( k)ji ( kr ), 


(41) 


(42) 


U NL, \ 

be (r) 


k 2 dkAe(k)je(kr) 


(43) 


P$(k) is the primordial matter power spectrum and will have only one free parameter - /nl- All other cosmological 

parameters will be fixed by fitting the CMB power spectrum, Ci to the data. We used CMBFAST (Zaldarriaga & Seljak 1996) 

"”!) and (p| 


to obtain the radiation transfer function and calculate the integrals in equation (|42| ) and (|43|)n We g enerated the theor etical 
bispectrum by running this modified version of CMBFAST with the best fit parameters obtained in Balbi et al. (200C) and 
normalised P#(fc) to the MAXIMA-1 and COBE/DMR data. Note that (eq. [H| ) will depend on the square of the 

normalization used for the power spectrum, so that the choice of normalization will affect the value obtained for /nl- Also, 
when generating these values, we consider /jvl = 1. In Fig. |we show the diagonal components for the theoretical reduced 
bispectrum. We multiplied the values by £ 4 , which basically corresponds to introducing a factor of £{£ + 1) for each bf'(r), 
making it approximately constant for low £ as in the case of the Ci. Note that this is no longer possible for the non-diagonal 
components, due to the mixed multipoles (eq. fri]) and there is no standard factor we could use in that case. We are now in 
conditions to compare the theoretical values with the ones estimated from the data. 

2 The non existence of a P$(k) term in equation f p| ) requires some care when obtaining the transfer function: one should increase the 
limits of the time integration used to calculate A i(k). 
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Figure 10. Comparison between theory and experiment. The data points were obtained using the minimum variance estimator. 


5.1.1 Minimum variance estimator 


We should in principle use equation (|23|) for each value to translate from the theoretical bispectrum to the estimated bispec¬ 
trum. Unfortunately this is not practical since it would require the full knowledge of the window function, £ 3 )■ 

However, as can be seen in Fig. the theoretical bispectrum changes quite slowly along the width of the window function (see 
fig. |l]), so that we can safely extract the bispectrum from the integral (which then gives just 1). So, basically, we generated 
the theoretical bispectrum for each of the modes analyzed in section 3.2.3 (1409 in total) and combined them into bins in 
exactly the same way as before, using equation (|28|). We then fit the free parameter, /nl to the data by using the standard 
X 2 statistics: 


X 2 (/a 


E l fyobs n jyt 
\^ol JNL -Dq 

a,a' 


l\ sy-l ( fyobs n r>^\ 

) ^aa' ~JNL -&(*'), 


(44) 


where again, C aa < is the covariance matrix of the estimators obtained from the Monte-Carlo realizations. B^ 1 is the binned, 
calculated bispectrum from equation © without the f nl. Assuming a Gaussian distribution for the estimated bispectrum, 
we can obtain the maximum probability for fj ml just by minimizing the above \ 2 with respect to fi vz,: 


}nl = 


E 


ryobs s~i —1 7 yth 

B a C aa ,B a , 


E 


ryth —1 jryth 
Dr/ Cv / -Dry/ 
QQ <-* 


(45) 


The best fit value is fi vl ~ 1500. Using this value, we show in Fig. [l^ an interesting comparison between theory and 
experiment. It is clear that the estimated error is still too large to allow us to make a clear statement about the value of 
f nl- Even f nl = 0 seems reasonable when looking at this plot. However we expect the non-diagonal terms to have smaller 
error bars, which may bias the data to prefer a larger value for /nl ■ Applying the same minimization to each Monte-Carlo 
realization we obtain the distribution for the non-linear coupling parameter, /jvz,. This is shown in Fig. [Tl| The peak of the 
distribution is at zero, as expected for MC simulations of a Gaussian sky, so that the estimated /nl obtained by minimization 
of the standard \ 2 is i n f ac t unbiased. The 68 % confidence limit gives |/ivn| ~ 950. Although the analysis seems to indicate 
a value for f nl slightly larger than what one would expect for a Gaussian map, the discrepancy is not strong enough for a 
claim of detection of a non-zero fi ml (approximately 20 % of the values would be larger than the one obtained). 


5.1.2 FFT estimator 

The analysis using the data obtained by the frequentist estimator is quite similar to the one above. We again generate the 
theoretical bispectrum for each of the modes measured by this estimator and combine the values into bins in exactly the same 
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Figure 11. /jvz, distribution for the minimum variance estimator. 



Figure 12. Comparison between theory and experiment for the FFT estimator (non-diagonal components). The combination used for 
l\ and (.2 is supposed to give the smallest error bars. 


way as in section hi. The best fit value gives /jvl ~ 2700. Using these fit, we plot in Fig. the experimental values against 
the non-diagonal components of the theoretical bispectrum. The corresponding histogram distribution for Jnl is shown in 
Fig. The 68% confidence limit gives |/jvn| ~ 1650. As expected the constr aint on a non-zero J nl is larger than in the 
minimum variance estimator. This is a slightly weaker limit than obtained by Santos et al. ( 2002 ), but the previous limit 
assumed only a Sachs-Wolfe contribution to the bispectrum, whereas this analysis includes the full radiation transfer function. 
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Figure 13. /nl distribution for the FFT estimator. 


6 DISCUSSION AND CONCLUSIONS 

In this paper we have developed a minimum variance estimator for the flat sky approximation in the limit of weak non- 
Gaussianity. We have discussed its numerical implementation and substantiated some approximations that make the method 
faster. As a test we applied this estimator to the MAXIMA-1 CMB map and obtained a total of 1409 bispectrum modes 
combined into 216 bins. An analysis of these values using a \ 2 statistics indicates that the data is consistent with Gaussianity. 
We also reviewed the more simple FFT estimator and applied it to the same dataset so as to compare the two methods. 
Although the overall conclusions obtained from an analysis of the CMB map are the same, the FFT estimator shows a larger 
variance and higher correlations between the bispectrum modes than the previous method. This should convince us to look 
at the minimum variance cubic estimator as the method of choice to measure the bispectrum. However, contrary to the FFT 
estimator, this method is quite slow in computational terms making it more difficult to apply to large datasets. Fortunately the 
study presented in this paper also shows a possible way out: one should still use the standard frequentist method (equation |3^) 
but apply it instead to the inverse map, Zi = (f'ATfxj), where the 2-point correlation function of the temperature field, Gi, 
is already known from the power spectrum measurements. The obtained values should then be compensated by the relevant 
factor, e.g. equation J27[), to obtain the bispectrum. This makes the minimum variance estimator method viable to use with 
the large datasets from the next generation satellite experiments, the Microwave Anisotropy Probe (MAP) and Planck. 

We also used the measured bispectra to constrain cosmological models. We considered a primordial non-Gaussian curva¬ 
ture perturbation and generated the corresponding bispectrum taking into account the full radiation transfer function with the 
best fit cosmological parameters from the MAXIMA-1 experiment. We obtained a weak constraint on the non-linear coupling 
parameter, f nl■ The 68% confidence limit from the minimum variance estimator is |/jvl| < 950. This is the tightest limit on 
primordial non-Gaussianity in the CMB which includes the full effects of the radiation transfer function. The fit to the data 
from both methods, seems to indicate a value for / nl slightly larger than what one would expect from a Gaussian sky. If we 
were to ascribe this to a cosmological origin it would mean the slow-roll conditions during inflation were violated. However, 
the value obtained could be due to some foreground contamination, or just due to a statistical fluctuation (around 20% of all 
Gaussian skies could still yield a value for |/jvn| larger than the one obtained). 

The current variance for the bispectrum is still too large to allow a clean measurement of the bispectrum. The situation 
should improve with current and future satellite experiments and using the methods developed in this paper one will hopefully 
be able to put stringent constraints on / nl ■ One should bear in mind, however, that neither the MAP or Planck experiments 
will be able to detect primordial non-Gaussianity with the usual slow-roll assumption for Inflation. 
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APPENDIX A: ALL SKY AND FLAT SKY BISPECTRUM 

We want to establish the correspondence between the full sky bispectrum and the flat sky bispectrum. The approximation 
will be valid when analysing a small patch of the sky and so, to correctly establish the relation, we should take into account 
the effect of incomplete sky coverage on the bispectrum estimator. 


Al Incomplete sky coverage 

Incomplete sky coverage means the harmonic coefficients will be given by, 


atm = / dQAT(ti)W(Q)Y e * m (n) = 


dHAT(H)Y/ m (H), 


(Al) 


where W(H) defines the region of observation (1 in the region and 0 outside) and H 0 bs denotes the solid angle of the observed 
sky. Orthonormality of the spherical harmonics on the sky is therefore destroyed and the a| m will be related to the true 
harmonic transform, atm, through 


a tm — E E O'im 

£' =0 m' =—£' 


(A2) 


dmw(D)Y/ m (B). 

'S 

The estimated bispectrum will then be biased compared to the true bispectrum: 

= E b W 3 E 

all V all m' 

X J d^W^Y^m'^Y^m^l) 

x J dH 2 w(H 2 )Y,, m ,(n 2 )Y/ 2 m 2 (n 2 ) 
x J dfi 3 w(Q 3 )Y e , 3 m , 3 (n 3 )Y; 3 m 3 (n 3 ) 

» bi lta t 3 J dQW 3 (Q) Yt* mi (D)Y/ 2tn2 (n )Y/ 3 m 3 (fi). 

Where we have used, 

Gt 3 m\Q't 2 m‘ 2 (et 3 m 3 ) btit 2^3 dfl V)i mi (H)! 7 )^!?^ (H) I ^37713 (H) 

and Yt m (Q.)Ylm(Q') = <5(11 — f^)- One also needs to assume that bt 1 t 2 t 3 varies much more slowly than the coupling 
integral, so that we can take bt 1 t 2 t 3 outside the sum over (! (the sum over m! should peak at t'\ = h, 1*2 = £ 2 , £3 = tz)- This 
is the expression we should be comparing to the flat case. If we further convolve equation (AS) with the Wigner-3j symbol, 


(A3) 


(A4) 


/ £1 ti t 3 \ 

Vmi m 2 m3) 


it is possible to obtain the corresponding bias for the angular averaged bispectrum (Komatsu et al. 2002): 


{Be 1 ^ 3 ) 


Hobs 

47T 


B, 


eie 2 e 3 ’ 


(A5) 


(A6) 


so that the correction for the angular averaged bispectrum corresponds, as expected, approximately to the fraction of the sky 
observed. For the flat sky analysis, the measured Fourier modes will also be given by a convolution of the true ones, through 


a s (k) = 


(271 


■ W(k-k>(k'). 


(A7) 








Multiple Methods for Estimating the Bispectrum... 21 


Using equation (|7|) we then have, 
(a s (ki)a s (k 2 )a s (k3)> « B(k 1 , k 2 ,k 3 ) 

f d 2 fc( d 2 k’ 2 


W{ ki - ki) W’(k 2 - k' 2 ) W( k 3 + ki + k' 2 ) 


J (2 tt) 2 (27r) 2 
= B{k 1 ,k 2 ,k 3 ) J d 2 xtU 3 (x)e" ix (kl+k2+k3) , 


(AS) 


where again it was assumed the window function is sharp enough so that we can take B(k\,k 2 ,k 3 ) outside the integral. We 
are now in conditions to prove the correspondence between the all sky and the flat sky bispectrum. 


A2 All sky to Flat sky correspondence 


As in Hu (200C), let us start by showing the relation between the Fourier modes, a(k) and the harmonic coefficients, a^ m . For 
small angles around the pole (9 —> 0), one can approximate the spherical harmonics by 




im<f> 


(A9) 


where J m (£0) is a Bessel function. We can then relate the spherical harmonics with the plane wave, using the expansion: 
e Ux = J m {tx)e irn ^- M « 


y^J m Ye rn (x,(f> x )e 


(A10) 


where £ = {£ cos 4>i,£ sin ()>/) and £ can be seen as the continuous limit of the integer labeling the multipoles aim- This is valid 
for small x where x = {x cos(rf) x ), x sin(0 x )). Near the pole of spherical coordinates, defined by (0,0), we can lay down almost 
Cartesian coordinates by defining the 2-dimensional vector 9 = (9 cos(0), 9 sin(0)), and then use the correspondence x = 6. 
as already implied in the above expression. The inverse relation is then, 


Wm(^,0x) — 

Thus, 

a r(n) = 


irruf> £ i£-: 

~2E e e 


(All) 



^irrufz i£0 

~2E e e 


Q'£m€ 


irrupt 


i£0 


The corresponding Fourier mode should then be given by 
a{£) = 


27T 

T 


m irrub o 

aim.e 


for large l. Its inverse relation is: 


Q-im 


d4>i 


e im ^a(£). 


(A12) 


(A13) 


(A14) 


Finally, we can now work out the connection between the all sky and flat sky bispectrum. Using the above relation, 

-^1 I £ 2 / ^3 -7711+7712+777.3 -j 


/ s s s \ 

Y^£\ 7711 a e 2 7712 ^^ 3777-3 / 

f ri(+ d(/u 2 d(f>t. 


27T V 27T V 27T 


3 B(£ u £ 2 ,£ 3 ) 


2tv 2n 2n 


3_ e ~imicl>ei e -im 2 cl>e 2 e -im 3 <l> e3 / ^ 2 £ pj/ 3 ( X )g _ix '(U+^2+^ 3 ) 

J 


(A15) 


where equations ( A14 ) and (^^) were used. The integration involving the exponential, e * x will be done only for 
small values of x, due to the presence of the window function, so that it is a good approximation to decompose the exponential 
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in terms of the spherical harmonics (eq. A10). Further integrating over the azimuthal angles 
to 

(Hfjra! a t 2 m 2 a e 3 m 3 > & B(l 1,^2,4) J d, 2 X W(x ) 3 Y/ imi (x)Y 4 * ma (x)^*^ (x). 


Comparing this to equation (A3) we obtain the desired relation (eq. g) 
be 1 i 2 e 3 « B(l 1 , £ 2 , £ 3 ) ■ 


j, collapses the sum 
(A16) 

(A17) 


APPENDIX B: BEST CUBIC ESTIMATOR: ALL SKY 


In this appendix we will calculate the optimal estimator for the reduced bispectrum be 1 e 2 e 3 in the full sky setup; Heaven: 


(1998) calculated the optimal estimator for the third moment, but clearly it contains no more information for a statistically 
isotropic field. 

Again, we consider quantities, y a , cubic in the AT) 


y.. = ^2 E? jk ATiATjATk, 

pixels ijk 


(Bl) 


with a = G}- To obtain the mean of y a we have to consider the 3-point correlation function, which may be written in 

terms of the reduced bispectrum as follows: 


f-iijk = {ATiATjATk) = 


^1 ZzVsQijk 


*1 «2 13 


where 

Qi jk = w ei Wt 2 w e3 


Qzzr 3y ^ x mYe 2 m 2 (nj)Y tama 




(B2) 


(B3) 


We have assumed that the noise has a zero 3-point function. If it is known and non-zero, it may be added. The effect of 
beam-smearing is taken into account through window functions, We, multiplying the ae m and hence affecting our estimate 
of be 1 e 2 e 3 . Note that Gangui & Martin (2000a) made some progress in defining an estimator for the bispectrum, but present 
final results only for all-sky coverage. 


The remainder of the analysis follows closely that in Heavens (199f). We minimize the variance of y, which involves the 
6 -point function. The ensemble average of y is 


(Voi) — ^ ' boc'Q 

a'ijk 


F a - 

xj k ■L-'ij k • 


(B4) 


The covariance between the ys is C aa i = {y a y a ') — {ya){y a ') which we obtain from the triplet data covariance matrix: 

{ATiATjATkAT t iATjiAT k i) - {AT i AT j AT k ){AT i ,AT y AT k ,). (B5) 

We again assume that the field is close to a Gaussian, and approximate the covariance matrix by the covariance matrix for a 
Gaussian field with the same power spectrum. This assumes that the bispectrum is small compared with the cosmic variance, 
and also assumes that the connected 4-point function is small. The method is therefore optimal for testing the hypothesis that 
the field is Gaussian. For non-Gaussian fields it should be an accurate estimator provided the bispectrum is small compared 


with the cosmic r.m.s., which is the case for most models considered (an exception is the curvaton model of Lyth & Wand: 
2005, where the amplitude of the bispectrum is related to other parameters and may be large). 

In the Gaussian approximation, (ATiATjATk) = 0, and we use Wick’s theorem to write 


(ATiATjAT k ATii ATji AT k >) = 

+ permutations (15 terms). 

where we have defined the 2-point function of the temperature field: 
21 + 1 


Zij = {^TiATj) = 


£ 


47T 


■ CtPe(cos"iij)We + N i: 


(B6) 


(B7) 


Nij is the noise covariance matrix, and Ci = (|a£ m | 2 ) is the angular power spectrum. Repeating exactly the analysis in section 
2 of Heavens| (1998), we obtain the weights for the optimal estimator for the reduced bispectrum, by minimizing the error on 
y a . The results only are quoted here. The optimal weights are 

po _ (j_ t-lf-lf-l /qa_ 1 

ijk gWSu'SM'Wj'fc' 2(2 + 3n) 


Gkjk&lQh'k' 


(B8) 
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where the summation convention is assumed and n is the number of pixels used. If we plug this expression into equation (Bl) 
and assume the second piece (with the factor 2 ( 2 + 3 n) l is negligible, we obtain 

vt 1^3 = \w tl W^W, 3 Y ZiZ i 


m i m 2 m 3 


= -W ei W e2 W t:i I dne tl (Q)e t2 (n)et 3 {Q), 

with Zi = f-fATi/ and 

(H) ~ ^ ^ I Em (Hi) I Em (H) • 


(B9) 


(BIO) 


We used equation (|4j) for the Gaunt integral. The sum -Zilfm(fL) over the pixels is basically the harmonic transform of the 
map Zi. The actual estimator for the reduced bispectrum is 


ba — F aa ,y a i 

and the Fisher matrix is then 


(Bll) 


F aa ' — C ijki' j'k'QijkQi' i'k' 


= [&»' (6£jj'£fcfc' + 9^-fc^'fc')] 1 QijkQi'j’k' 


(B12) 


The proof that this weighting scheme is lossless (i.e. y a contains as much information as the entire pixel dataset) follows by 
the Fisher matrix approach in Heavens (199S). The method is lossless in the sense that the Fisher matrices coincide, so the 


likelihood surfaces are locally identical. 


APPENDIX C: BEST CUBIC ESTIMATOR: FLAT CASE 

We will now derive the precise steps to obtain the best cubic estimator in the flat-sky approximation. We need first to calculate 
the 3-point correlation function of the temperature fluctuations: 


(AT a (x;) AT a (xj)AT a (x*,)) = W(xO W(x J -) W(x fe ) 

d 2 ki f d 2 k 2 


7Y-V2 I 7Y-V2 B ( fe i- l k i + M) B(fci) «(&) ®(l k i + k 2 1) e ikl Xi e ik2 Xj e- i(kl+k2) Xfe . 

(2J (2 

The window function W (x,), will just correspond to Os and Is, so as to define the field of observation, and taking into account 
the rotational symmetry of the bispectrum: 

poo poo P7T Q 1 J 

(AT , a (xi)AT , a (xj)AT , a (xfc)) = / dki dk 2 d6 —^ B(ki, k 2 , |ki + k 2 |) 


(2ti 

P7T 

xB(fci) B(k 2 ) B(|ki + k 2 |) / d9 1 7le{G yfc (fci, k 2 ,9,0i) + G ijk (k u k 2 , -9, ft)} 


where, 


G ijk (k 1 ,k 2 ,e,9 1 ) = e ikl ' x * e ik2 e - ikl ' Xfe e" ik2 Xfc 


(Cl) 


and ki = (k\ cosft, ki sinft), k 2 = (k 2 cos(0\ + 9),k 2 sin(9i + 9)). Let us now do a simple variable transformation, ft = 
ki, t 2 = k 2 , ft = ^k\+kl + 2k\k 2 cos (9) so that Gijk(ki,k 2 ,9,9i) + Gijk{ki, k 2 , — 9, 9i) —► Gftfc(ft,ft,ft, ft). Using the 
permutation symmetry of the bispectrum, we finally get, 

poo P^2 

(AT a (xi)AT a (xj)AT a (xfe)) = / dft / dft / dftB(ft,ft,ft) 


xB(£i)B(£ 2 )B(l 3 ) Y / d0 1 'R.e{R ijk {£ 1 ,£ 2 ,£ 3 ,9 1 )}, 


perm. 

Pl.<2.*3> 


with, 

Rijk{£ l,ft,ft, ft) = 


4 t\£ 2 t 3 


■■ Gijk(£9i). 


(2nY^{2£ 1 £ 2 y-{£l-£l-ll) 

Looking back at equation (hoh it is possible to read off exactly what should be the expression for Qijftft, ft, ft): 


(C2) 


(C3) 
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Qijk(hj2,e 3 ) = B{e 1 )B{£2)B{e 3 ) J2 / 


perm. 


Then the expression for y a , or y(£ i,£ 2 ,£ 3 ) in the flat case, will be 
y(£i,£ 2 ,£ 3 ) = B(£ 1 )B(£ 2 )B(£ 3 )^ ^ jT d0 1 TZe{R ilfk ,{£ 1 ,£ 2 ,£ 3 ,e 1 ) 


xiu' 


t-ie - 1 _ 3 t-v 

S97'Sfcfc / o i 0„ Slfc >7 -/ 


perm. 

- 1^-1 


AT’ s (xi)AT s ( x j)AT s (xfc)} 


— 2 + 3n “N'fc'J 
The ensemble average of y gives 

r°° rS r e 2 

(y(£i,£ 2 ,£s))= d£[ . dt' 2 df! 3 F(l x ,£ 2 ,hAAA)B{t! x ,£!^(! 3 ) 


(C4) 


(C5) 


(C6) 


With 

F(£i,£ 2 ,£ 3 ,£[,£ 2 ,£ 3 ) = (y(£i,£ 2 ,£ 3 )y(£' 1 ,£' 2 ,£ 3 )) 

= B(£i)B(£ 2 )B(£ 3 ) i ^ [ dOille{Ri jk {£i,l 2 ,i 3 ,0i)} 




HiktLJ'l' B{£' X )B{£' 2 )B{£' 3 ) 


2 + 3 n 

/• 7T 

perm. ® 

In the paper we used the following estimator for the bispectrum: 
y{£ i,£z,£ 3 ) 


B(£i,£ 2 , £ 3 ) = 


where 


M(£ u £ 2 ,£ 3 )’ 


M(£ !,e 2 ,£ 3 ) = J d £‘1 J e[ d£' 2 J d£' 3 F(£ 1 ,£ 2 ,£ 3 ,£' 1 ,£' 2 ,£' 3 ) 


d 2 x £>(|x;/ - x|)23(|xy - x|)B(|x fc / - x|) 


A dd 1 Ke{R i iji k f(£i,£ 2 ,£ 3 ,9 x )} 




'■ £jk ij'k' 


B{£i)B(£ 2 )B(£ 3 ). 


2 + 3 N ^ 

If we approximate the beam window function by a Gaussian, 
S(|x|) = 

the above integral will simplify, giving 


M{h,£ 2 ,£ 3 ) = A s 27[£_ e - 5 i I [ | x 1 ,-* il | = , + |x 1 ,-* Jl/ |= , T | ^,- 3t|1 . 


- x fc'l 


j d GBe{R i'j'k' 
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<& [e,y^ - m)B(£ 2 )B(£ 3). 


(C7) 


(C8) 


(C9) 


(CIO) 


(Cll) 

Note that the beam window function B(|x|) quickly becomes negligible for large |x|. Basically this means the sum above does 
not have to be done for all i! j' k !, so that we end up with a sum of approximately order n instead of the original n 3 . 



